decouple function and class into separate files

I have a file that has a thousand line of codes and I'd like to break it into several files. However, I found those functions depends on each other so I have no idea how to decouple those... Here is a simplified example:

import numpy as np

def tensor(data):
    return Tensor(data)

class Tensor:
    def __init__(self,data):
        self.data=data
    def __repr__(self):
        return f'Tensor({str(self.data)})'
    def mean(self):
        return mean(self.data)

def mean(data):
    value=np.mean(data)
    return tensor(value)

What is the best way to separate tensor, Tensor, and mean (put them into 3 different files)? Thanks for your help!!


Solution 1:

Having a module that is thousands of lines long isn't that bad. You may not actually need to break it up into different modules. It is common to have a module that has a function alongside a class like your tensor and Tensor in the same module, and there is no reason for mean to be split up into a separate function as that code can just be placed directly in Tensor.mean.

A module should have a specific purpose and be a self contained unit around that purpose. If you are splitting things up just to have smaller files, then that is only going to make your codebase worse. However, large modules are a sign that things may need to be refactored. If you can find good ways of refactoring ideas in the code into smaller ideas, then those smaller units could be given their own modules, otherwise, keep everything as a bigger module.

As for how you can split up code that is coupled together. Here is one of way of splitting up the code into the modules you indicated. Since you have a function, the tensor function, that you would like people to use to get an instance of your Tensor class, it seemed like creating a Python package would be somewhat sensible since packages come with an __init__.py file that is used for establishing the API ie your tensor function. I put the tensor function directly in the __init__.py file, but if the function is pretty large, it can be broken out into a separate module, since the __init__.py file is just suppose to give you an overview of the API being created.

# --- main.py ----

from tensor import tensor

print(tensor([1,2,3]).mean())
# --- tensor/__init__.py ----

'''
Add some documentation here
'''

def tensor(data):
    return Tensor(data)

from tensor.Tensor import Tensor
# --- tensor/Tensor.py ----

from tensor import helper

class Tensor:
    def __init__(self,data):
        self.data=data
    def __repr__(self):
        return f'Tensor({str(self.data)})'
    def mean(self):
        return helper.mean(self.data)
# --- tensor/helper.py ----

import numpy as np
from . import tensor

def mean(data):
    value=np.mean(data)
    return tensor(value)

About circular dependencies

Tensor and helper are importing each other, and this is ok. When the helper module imports Tensor, and Tensor in turn imports helper again, helper will just continue loading normally, and then when it is done Tensor will finish loading. Now if you had stuff on the module level (code outside of your function/classes) being executed when the module is first loaded, and it is dependent on functionality in another module that is only partially loaded, then that is when you run into problems with circular dependencies.

Using classes that don't exist yet

I can add to the __init__ file

def some_function(): 
    return DoesntExist()

and your code would still run. It doesn't look for a class named Tensor until it is actually running the tensor function. If we did the following then we would get an error about Tensor not existing.

def tensor(data):
    return Tensor(data)
tensor()
from tensor.Tensor import Tensor

because now we are running the tensor function before the import and it can't find anything named Tensor.

The order of stuff in __init__

If you switch the order around you will have

__init__ imports Tensor imports helper imports __init__ again

as it tries to grab the tensor function, but it can't as the __init__ function can't proceed past the the line that imports Tensor until that import has been completed.

Now with the current order we have,

__init__ defines tensor, sees the import statement, and saves its current progress as a partial import The same imports happen (__init__ imports Tensor imports helper imports __init__ looking for a tensor function) This time we look at the partial import for the tensor function, find it, and we are able to continue on using that.

I didn't think about any of that when I put things in that order. I just wrote out the code, got the circular import error, switched the order around, and didn't think about what was going on until you asked about it.

And now that I think about it, the following would have worked too.

The order of things in the __init__ file will no longer matter.

from tensor.Tensor import Tensor

def tensor(data):
    return Tensor(data)

And then in helper.py

import numpy as np
import tensor

def mean(data):
    value=np.mean(data)
    return tensor.tensor(value)

The difference is that now instead of specifically asking that the tensor function exist when the module is imported by trying to do from . import tensor, we are doing import tensor (which is importing the tensor package and not the function). And now, whenever the the mean function gets run, we are going to do tensor.tensor(value) to get the tensor function inside our tensor package.