I don't think this should aim at the general beat tracking problem,
especially not in the early stages.
I already did some experiments on this, and my preliminary conclusion is
that the most reliable way is to start from user input (i.e. some
tapping). If you have this as a basic hint to start from it is not that
difficult to get in & keep sync.
But I don't have time to take on another project...