Hi all,
I wanted to continue the discussion that was from the Atomic Game Engine announcement page here, roughly from page 9:
http://discourse.urho3d.io/t/atomic-game-engine-mit-urho3d-fork/643/1
@cadaver It was interesting to see your take on how engines like Unreal/Unity are designed in relation to Urho and mentioned about Naughty Dog’s engine. I discovered this the other day as well:
https://github.com/JodiTheTigger/sewing
Which is basically an implementation of their fibers co-routine thread-pool which uses the assembly from boost to do fast context jumps.
Anyways, if one were to experiment with this way of designing an engine, I’m just trying to gauge how much refactoring one would need to do with Urho3D vs one from scratch (but I’ll repeat, as an experiment which would most definitely break existing features like immediate physics steps like you mentioned). You do mention if we didn’t have this then it would be easier to multi-thread the engine. You know it best, what other things would be the potential gotchas do you think you would need to know or the best way to start this? In my head, I guess decoupling the renderer so it is isolated from game logic is one step (the proxy approach sounds like what most engines are doing to separate the game logic world/scene from the renderer, same for Stingray/Bitsquid):
http://bitsquid.blogspot.ca/2016/09/state-reflection.html
I think to be in-line with Vulkan/Direct3D12’s command buffer/lists, the refactor to Graphics.h would be to make another object which manages this for draw/state commands so you could have multiple ones running on multiple threads. On older APIs this could be a purely emulated object which never touches the hardware API which caches these as messages/buffers to be deferred & played back on the render or main thread. The potential problems would be for things that need to change vertex buffers would have to be carefully managed, especially things like blend shapes which I think are updated in an immediate fashion before the draw call (arguably a better/modern approach could be used here instead).
Having the renderer proxy separation, perhaps the data-oriented stuff could be applied here as this is where the tight loops would be run for things like frustum culling and such, so the ‘fat’ entity/component system which is scattered in memory might not be such a big deal in terms of cache-coherency for gameplay stuff (you could always nest a new component which is cache coherent for what you want to do, like a crowd system or particles as a workaround, I’ve seen this done by making a custom ‘shape’ in Maya for a crowd system). The other bottleneck I could think of is how bones for animated skeletons are a part of these ‘fat’ nodes, so my other experiment was to use ozz-animation for this task, however you would lose flexibility here like setting ragdoll collision boxes on bones for one, or just implement them differently.
The other thing was I was taking a look at how PhysX lets you implement its tasks to run in your own threadpool by using their interface:
http://docs.nvidia.com/gameworks/content/gameworkslibrary/physx/guide/Manual/Threading.html
And Bullet recently got patches to re-implement its threading ability for v2.x:
https://github.com/bulletphysics/bullet3/pull/390
I’d love to work on some experiments here, but being pragmatic I know my job will prevent me doing this so I am time poor. Having a focus through a discussion is important, apologies if there is already a thread about this. This post is already quite long, but I thought I’d just put down my thoughts on things. Cheers!