Category Archives: Misc Algorithms

Sudoku Solver using Simulated Annealing

Here is my failed attempt to solve Sudoku using Simulated Annealing. ( I think it is good to fail – it makes me learn 🙂 )

Input to the algorithm – initial Sudoku board configuration – set of 81 numbers – where a missing number in the board is indicated by -1.

Algorithm :

We choose random numbers between (1-9) and allocate it to 2 random ( not fixed i.e not already given numbers in the input ) cells on each sub-square* – making sure that each sub-square has no repeated integers.

* There are 9 sub-squares in the Sudoku board

The cost function for the algorithm is the number of integers between (1-9) which are not used per row and per column and sum those values across the grid.

The algorithm then tries to minimize the above cost function and then it follows the general Simulated Annealing pattern of reducing the temperature.

The Alpha value chosen by the algorithm is .94

However, using the above value of Alpha – algorithm has not converged to the solution.

The output of the algorithm prints out the resultant Grid and the best possible result.
The output also includes the number of Integers which are still mis-positioned.

Most of the runs of the algorithm end up between 5-6-7 numbers still out of place.

Command to run the code

python SudokuSolver -1 -1 -1 -1 -1 7 5 -1 4 -1 3 6 4 -1 -1 -1 -1 2 -1 -1 -1 -1 6 -1 -1 7 8 -1 -1 -1 5 1 -1 2 -1 -1 -1 -1 -1 -1 -1 9 -1 -1 -1 -1 -1 -1 2 -1 4 -1 -1 1 -1 -1 -1 8 9 -1 -1 -1 6 1 5 -1 7 3 -1 -1 -1 -1 -1 2 9 1 -1 5 -1 8 3

Code

Advertisements

Skyline Algorithm – Divide and Conquer

Give a list of tuples – where each tuple has 3 value – (x1, x2, h1) – each tuple represents a rectangular block of a building in the city skyline.

x1 – Start Coordinate of a Block on X Axis
x2 – End Coordinate of a Block on X Axis
h1 – Height of the block

Once a list of such tuples is given he problem is to find out the general skyline of the City int he 2D plane – where 1 building’s coordinates could be overlapping with another building and changing the skyline from a 2D perspective

For Example if we have a set of tuples as below

(1,5,10)
(2,7,9)
(4,15,12)
(6,9,15)

The general skyline algorithm should return

(1,4,10), (4,6,12), (6,9,15), (9,15,12)

Algorithm
I use a recursive Divide and Conquer Algorithm where I break up the original list into a list of smaller tuples.
When there is only 1 or 2 tuple in a list – we call the merge() routine to merge 2 tuples.

The results are printed as a set of blocks using matplotlib graphing library

Code

https://github.com/palsumitpal/sumitpython/blob/master/SkylineNew.py

Command Line
python SkyLineNew.py 1 4 10 4 6 12 6 9 15 9 15 12

// where you have a set of 3 numbers – which represents a tuple and
// have any number of such tuples

// the program exits if any number is a -ve number and if the
// number of entries is not a multiple of 3

The code finally generates a graph of the Skyline using matplotlib python library – something like the one below

SkyLine

Parallel MSB RadixSort

Here is a variation of MSB Radix Sort that I have thought and implemented. This is an embarrassingly parallel algorithm that can be easily translated to the Map Reduce paradigm.

I have implemented this algorithm using Java7’s Fork Join Algorithm.
Also implemented are 2 versions of python code – one Recursive and another Iterative which are NOT implemented using any parallelizable code in python.

Core Ideas of the Algorithm

  1. Partition Numbers into Buckets based on the number of digits : each bucket is then worked in parallel
  2. Intra Bucket Parallelism: Numbers in the same bucket (i.e –numbers with the same number of digits) – their sorting can be further parallelized by making 10 more buckets based on their ith –MSB digit. This can happen recursively – giving a huge amount of intra-bucket parallelism.

Java Code usage

Dependencies – SLf4J and Perf4J and Javolution libraries

java MSBRadixSortingWithForkJoin <Number of Items that you would like to Sort>
#Items that you would like to Sort – Integer number. The code internally generates that many random numbers

https://github.com/palsumitpal/JavaCode/blob/master/MSBRadixSort/MSBRadixSortingWithForkJoin.java

Python Code Usage

python MSBRadixSortIterative.py filename
python MSBRadixSortRecursive.py fileName

FileName – name of a file – which has 1 integer / line which needs to be sorted

https://github.com/palsumitpal/JavaCode/blob/master/MSBRadixSort/MSBRadixSortIterative.py

https://github.com/palsumitpal/JavaCode/blob/master/MSBRadixSort/MSBRadixSortRecursive.py

Binarization Using Map Reduce

Code Available at – https://github.com/palsumitpal/JavaCode/tree/master/Binarization

Description

Binarization is often needed for Machine Learning Algorithms as a preprocessing step on input data before application of the algorithms. Here is my initial attempt to write a Map Reduce code for Hadoop to achieve binarization.

The code at Github can be built using the pom.xml file

Command Line

hadoop  jar binarization.jar hivetableinput hivecolumnmetatest hivebinarizeouttest \N ? ColumnNames.txt 1 2 4 5 6 7 11 16 17 18 19 20 27

Command Line Parameter

1st Param — InputDirectory for files to be binarized

2nd Param — Directory where ColumnMetaData is generated

3rd Param — Directory of output of the Binarization

4th Param — Missing Column Value Indicator Character in Input files

5th Param — Missing Column Value Indicator Character in Output files

6th Param — Column Names file in the input data set(1 Column Name per line — this is generated in the output binarization file as header)

7th Param… — the Column numbers (starting from 1) from the Input File ( as mentioned in 6th Param ) — which need to be binarized

Assumptions

All Columns which are not in the 7th param onwards ( but are part of the input file ) they are not binarized – but appended to the end columns of the output file.

Example

If you have an input file with the following columns — Col1, Col2, Col3, Col4, Col5

You want to binarize the following columns — Col1, Col4, Col5

Then only the contents of Col1, Col4, Col5 — will be binarized and outputted in the output file

However, the values of Col2, Col3 will appear in the output file but as the end columns

If

Col1 — Has Distinct Values as C11, C12, C13

Col4 — Has Distinct Values as C41, C42

Col5 — Has Distinct Values as C51, C52, C53, C54, C55

Then the output format will be like this

C11, C12, C13, C41, C42, C51, C52, C53, C54, C55, Col2, Col3

C11, C12, C13, C41, C42, C51, C52, C53, C54, C55

Will have values 1 or 0  — Since they are the ones that is binarized

Col2, Col3 will have original values from the input file

Remember the #of Rows in Output File — will be the same as the #Of Rows in Input file

Program Description

The MR program is composed of 2 jobs.

ASSUME – THE Input DATA FILE HAS NO HEADERS

a1      b1      c1      d1

a2      b1      c1      d3

a1      b2      c2      d1

a3      b1      c4      d2

a1      b2      c3      d1

a2      b1      c1      d1

JOB1

Output of the Mapper

1      [a1, a2, a1, a3, a1, a2]

Key Column#

Value – List of Values in Column#

Output from Reducer

1,a1, a2, a3

2,b1, b2

3,c1, c2, c4, c3

4,d1, d2

Key – Column#

Value – List of Unique Values in Column#

 

The above is written to the ColumnMeta File — this file — will be used in Job2 – This file – contains data for those columns which are mentioned from Param 7 onwards in the input command line

1st Job – Output is in 2nd Param directory

This directory also contains the header String for the final output to be generated after 2nd Job is run – this is in a file named – columnHeader.txt

The headers are generated using ColumnNames from the – 6th Param File ( which is the column Names file) + The Unique values in each Column

This directory also contains the ColumnMetaData – which is the output of the Reduce Step

2nd Job – Map Only Job

This generates the actual Binarized output file.

In order to get a file which has the header and the binarized output – concatenate the file columnHeader.txt ( from the output _directory of 1st Job) with the contents of the file in this directory.

Enhancements which can be made to this Program

  • Checking of Parameters and writing out error message
  • Error Handling
  • Input data can be from Hive / HBase / Impala
  • Test Cases

 Testing

Here is a Python code – which can test the results of the output of the Binarization Code using Map Reduce above – with something you may have generated results from some other program.

https://github.com/palsumitpal/JavaCode/blob/master/Binarization/CompareBinarizeResults.py

The python program needs the following inputs

Param1 – Name of 1st Output file

Param2 – Name of 2nd Output file

Mapping File – which maps which column in file in Param1 – matches with which column in file in Param2 – the format of this file is Col#FromFile1 TAB Col#FromFile2

https://github.com/palsumitpal/JavaCode/blob/master/Binarization/MappingHeader.txt

An Example Mapping file is also posted at the github location

Solving Puzzle using Simulated Annealing

Look at the Picture below

photo (1)

There are NxM square tiles – each edge of which is colored using a different color.

If 2 adjacent tiles have the same color facing each other – we add a point.

The problem is to Swap the tiles so that we get the maximum number of points. Swapping means that 1 tile is exchanged with another tile without changing the orientation of any of the tiles.

What is the best way to solve this problem.

I tried Simulated Annealing.

 

Here is the programmatic (python) solution to this.

Python code is available here

https://github.com/palsumitpal/sumitpython/blob/master/SimulatedAnnealingPuzzleSolve.py

With my solution – it approaches quite close to the maximum possible

The way to run the code is below

python SimulatedAnnealingPuzzleSolve.py 5 5 20000 Greedy

1st Param (int)- Rows in the Grid

2nd Param (int)- Cols in the Grid

3rd Param (int)- Number of Max Iterations to be done

4th Parameter (String)- Whether to choose the Greedy algorithm or generalized Simulated Annealing – Possible values – Greedy or Any other String ( which chooses Generalized Simulated Annealing )

The output of the program gives the final score – either when it 1 or 2 points from the max possible points or after the program terminates after the Max Number of Iterations.

It also prints in ( Color in Unix Console ) – the arrangement of the tiles.

Sample Output from a Unix Console ( cygwin / unix command prompt – you can see the colors)

On windows command prompt – you will only see the color codes

1 Block is represented as

-Y-

G-B

-R-

[cloudera@localhost code]$ python SimulatedAnnealingPuzzleSolve.py 5 5 20000 Greedy

Initial Pieces

-Y- -G- -R- -Y- -G-

G-B R-Y Y-G B-G Y-B

-R- -B- -B- -R- -R-

-Y- -Y- -Y- -R- -R-

R-G G-B G-R B-G G-B

-B- -R- -B- -Y- -Y-

-R- -B- -R- -R- -G-

B-G Y-G G-Y G-B B-R

-Y- -R- -B- -Y- -Y-

-B- -R- -G- -G- -Y-

Y-R G-Y B-Y R-Y G-B

-G- -B- -R- -B- -R-

-G- -G- -Y- -B- -Y-

B-R B-R R-B Y-R R-B

-Y- -Y- -G- -G- -G-

 

Max Possible Score Possible for GridSize(5,5) = 40

Initial State 13

Max achieved by Simulated Annealing with Steepest Ascent 34

Current Max State After Generalized Simulated Annealing

-Y- -G- -Y- -G- -G-

G-B B-R R-B B-R R-Y

-R- -Y- -G- -Y- -B-

-R- -Y- -G- -Y- -B-

Y-G G-B B-Y B-G Y-R

-B- -R- -R- -R- -G-

-B- -R- -R- -R- -G-

Y-G G-Y B-G G-Y Y-B

-R- -B- -Y- -B- -R-

-R- -B- -Y- -R- -R-

G-B Y-R R-B B-G G-B

-Y- -G- -G- -Y- -Y-

-Y- -G- -G- -Y- -Y-

G-B B-R R-Y G-R R-G

-R- -Y- -B- -B- -B-

 

 

 

GridGame Algorithm Code – Finding a String from Characters in a 2D Grid

Problem Definition – Assume you are given a 2 dimension grid – with each cell in the grid as a character from the ASCII set. Find out an Algorithm and code to find out if a String is present in the Grid by starting from any point in the Grid and moving along any direction in the grid – Up/Down/Left/Right/Diagonally. Loop Around – at the edges is not allowed

Example

If the input matrix is

[a, b, c]

[d, e, f]
[g, h, i]
[j, k, l]
[m, n, o]

and the Input string is – cbdhln

The given string is Present in the grid

While the string – cej

is NOT present in the Grid

Approach

Recursive and brute force keeping the path list of all available paths ( list of coordinates – which are neighbours)

Create a HashMap of each Char in the Grid and the locations it is present in.

So the Key for the HashMap is the Character and the Value – List of all Coordinates(Point) where the char occurs in the grid

Take each char from the input string and compute if this char can be a neighbour of the prev char and if so – keep a list of all such paths – where each path is a list of all Coordinates.

If the string exhausts and we get a path – we return true

If any char from the input string is not present in the HashMap means – the string cannot be formed from the Grid characters.

Otherwise we continue to the next character in the string and keep looking if the current character from the string is neighbour of the last element in each Path.

I am sure there are better algorithms for this – which I will think of more and code as I get more insights.

I have used Point DataStructure from awt java library.

Input to the code is like this

java GridGame 5 3 a b c d e f g h i j k l m n o cbdhln

1st Param – Number of Rows in Grid

2nd Param – Number of Columns in the Grid

3rd Param….(2nd to last Param ) — all the characters in the Grid

The last parameter – is the string to be found

Here is the code in Java

* I have done no argument level parameter testing in the code – since the code assumes that the inputs are all correct.

import java.util.*;
import java.awt.Point;

public class GridGame
{
public static void main(String[] args)
{
System.out.println(args[0]);
System.out.println(args[1]);
GridGame game = new GridGame(Integer.valueOf(args[0]), Integer.valueOf(args[1]), args);
System.out.println(“The given string is ”  + game.isPresent() + ” in the grid”);
}

char[][] inputGrid;
int row, width;
String matchString;

private String isPresent()
{
if (findMatch())
return “Present”;
else
return ” NOT Present”;
}

public GridGame(int r, int w, String[] data)
{
row = r; width = w;
inputGrid = new char[row][];
for(int i=0;i<row;i++)
inputGrid[i] = new char[width];

int index = 2;
for(int i=0;i<row;i++)
for(int j=0;j<width;j++)
inputGrid[i][j] = data[index++].charAt(0);

for(int i=0;i<row;i++)
System.out.println(Arrays.toString(inputGrid[i]));

matchString = data[index];

System.out.println(matchString);
}

HashMap<Character, List<Point>> inverseMap = new LinkedHashMap<Character, List<Point>>();
int currentIndex = 0;
public boolean findMatch()
{
populateInverseMap();
//        printInverseMap();
return findMatch(matchString, currentIndex);
}

private List<Point> lookupHashMap(char x)
{
return inverseMap.get(x);
}

List<List<Point>> currentNeighbourList = new ArrayList<List<Point>>();

private void showCurrentNeighbourList()
{
System.out.println(“Size of currentNeighbourList = ” + currentNeighbourList.size());
for(List<Point> p : currentNeighbourList)
System.out.println(Arrays.toString(p.toArray()));
}

private boolean findMatch(String s, int index)
{
//        System.out.println(index);
//        showCurrentNeighbourList();
if ( index == (s.length()))    return true;
List<Point> currentPoints = lookupHashMap(s.charAt(index));
if (currentPoints == null)        return false;
if (index == 0 )
{
for(Point p : currentPoints)
{
List<Point> l = new ArrayList<Point>();
l.add(p);
currentNeighbourList.add(l);
}
}
else
{
List<List<Point>> updatedNeighbourList = new ArrayList<List<Point>>();
updateNeighbourPoints(updatedNeighbourList, currentPoints);
if (updatedNeighbourList.size() == 0)    // no new neighbours added means no path
return false;
currentNeighbourList = updatedNeighbourList;
}

return findMatch(s, ++index);
}

private void updateNeighbourPoints(List<List<Point>> updateList, List<Point> p)
{
for(List<Point> pList : currentNeighbourList)
{
// Get Last Point in the List
Point p1 = pList.get(pList.size()-1);
for(Point pTest : p)
{
if ( isNeighbour(p1, pTest))
{
List<Point> newList = deepClone(pList);
newList.add(pTest);
updateList.add(newList);
}
}
}

}

private List<Point> deepClone(List<Point> pList)
{
List<Point> newList = new ArrayList<Point>();
for(Point p : pList)
newList.add(new Point(p.x, p.y));
return newList;
}

private void populateInverseMap()
{
for(int i=0;i<row;i++)
{
for(int j=0;j<width;j++)
{
Point p = new Point(i,j);
Character current = inputGrid[i][j];
List<Point> currentLookUpValue = inverseMap.get(current);
if ( currentLookUpValue == null )
{
List<Point> lPoint = new ArrayList<Point>();
lPoint.add(p);
inverseMap.put(current, lPoint);
}
else
{
currentLookUpValue.add(p);
}
}
}
}

private void printInverseMap()
{
for(Character c : inverseMap.keySet())
{
List<Point> p = inverseMap.get(c);
System.out.print(c + ”  “);
printPointList(p);
}
}

private void printPointList(List<Point> pointList)
{
for(Point p : pointList)
System.out.print(p + “,  “);
System.out.println();
}

private boolean isNeighbour(Point p1, Point p2)
{
int p1X = p1.x;
int p1Y = p1.y;

int p2X = p2.x;
int p2Y = p2.y;

if (( Math.abs(p1X-p2X) == 1) && ( Math.abs(p1Y-p2Y) == 1)) return true;    // diagonal test
if (( Math.abs(p1X-p2X) == 1) && ( Math.abs(p1Y-p2Y) == 0)) return true;    // adjacent
if (( Math.abs(p1X-p2X) == 0) && ( Math.abs(p1Y-p2Y) == 1)) return true;    // adjacent

return false;
}
}

Array of A[N] of N numbers. Create Output[N] : Output[i] = multiplication of all the elements of A[N] except A[i]. Solve it without division operator in O(n)

Here is the simple Algorithm explanation and code in ( python ) for the problem given here

Problem

Try solving this in O(n)..

You are given a list of A[N] of N numbers. Create an array Output[N] such that Output[i] will be equal to multiplication of all the elements of A[N] except A[i]. Solve it without division operator and in O(n). For example Output[0] will be multiplication of A[1] to A[N-1] and Output[1] will be multiplication of A[0] and from A[2] to A[N-1].

http://www.linkedin.com/groups/Try-solving-this-in-O-1920463.S.165328313?qid=cf1bdabc-e4e1-489e-8557-e3ba125bb4cd&trk=group_most_popular-0-b-ttl&goback=%2Egmp_1920463

My Approach to this problem

Do 2 iterations of the array – starting once from the beginning of the array and once from the end and during each iteration calculate the running multiplication and store the results in 2 different arrays – pre[] and post[]

The results as asked in the algorithm is multiplication is the current index location pre[j-1] * post[j+1] – A special handling of the 2 edge cases when the index is at the end of the array is needed

For Example

if the input array is

pre = [7, 7, 14, 42, 168, 840, 5040]
post = [5040, 720, 720, 360, 120, 30, 6]

Python Code

l = [7,1,2,3,4,5,6]
pre=[]
po=[]

# Create an array which is a running multiple from start of the array
for j in range(0, len(l)):
if j == 0:
multiple = l[j]
else:
multiple = multiple * l[j]
pre.append(multiple)
print pre

# Create an array which is a running multiple from end of the array
for j in range(len(l)-1, -1, -1):
if (j == len(l) – 1):
multiple = l[j]
else:
multiple = multiple*l[j]
po.append(multiple)
po.reverse()

print po

# result of each multiplication is the current index location pre[j-1] * po[j+1]
# special handling of the 2 edge cases
for j in range(0, len(l)):
if j == 0:
print po[j+1]
elif (j == (len(l) – 1)):
print pre[j-1]
else:
print pre[j-1]*po[j+1]