0%

Harnessing the Power of C++ STL in Python with CSTL

By Z.H. Fu
https://fuzihaofzh.github.io/blog/

Introduction

Python, with its simplicity and readability, has become the language of choice for many developers. However, it has its limitations, especially when it comes to memory management in multiprocessing applications. The Copy-on-Write (CoW) issue it faces often leads to memory leakage, causing the program to consume all available memory and eventually crash.

CSTL is a powerful tool that brings the efficiency of the C++ Standard Template Library (STL) to Python. It provides Python developers with a way to leverage the efficiency of C++ STL containers, such as std::vector, std::unordered_map, and std::unordered_set, replacing their Python counterparts (list, dict, and set).

Why CSTL?

CSTL is designed to solve the CoW issue in Python and provide a more efficient alternative for scenarios where a standard C++ container is needed. It wraps several C++ STL containers using native C++ implementation, thus avoiding the CoW issue prevalent in Python’s native list and dict.

Installing CSTL

CSTL can be easily installed via pip using the command:

1
pip install cstl

For users on Windows or MacOS, or for those who wish to build from source, the tool can be compiled after first installing swig and then cloning the repository from GitHub.

1
2
3
4
5
conda install swig
git clone https://github.com/fuzihaofzh/cstl.git
cd cstl
./build.sh
python setup.py install --single-version-externally-managed --record files.txt

Using CSTL

CSTL provides a seamless way to convert Python objects into CSTL objects and vice versa. Here’s an example:

1
2
3
4
5
6
7
import cstl

# Directly convert containers from python
v = cstl.frompy({"1":[1,2,3], "2":[4,5,6]}) # convert python object to cstl object
v["1"][2] = 10 # access cstl object
pv = cstl.topy(v) # convert cstl object to python object
print(pv) # prints: {'1': [1, 2, 10], '2': [4, 5, 6]}

CSTL allows you to specify the type of the container, like cstl.VecInt() for a vector of integers or cstl.VecMapIntFloat() for a vector of maps with integer keys and float values. You can manipulate these containers just like you would with their Python counterparts.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
vec = cstl.VecInt([1,2,3,4,5,6,7])
print(vec[2]) # prints: 3

vec[2] = 1
print(vec[2]) # prints: 1

vec.append(10)
print(vec[-1]) # prints: 10

# User should explicitly convert std::vector into list as follows:
print(list(vec)) # prints: [1, 2, 1, 4, 5, 6, 7, 10]

vmif = cstl.VecMapIntFloat([{1:3.4},{4:5.5}])
print(vmif[0][1]) # prints: 3.4000000953674316

Supported Data Types and Containers

CSTL supports a variety of data types as elements in the containers, including int, std::int64, std::string, float, double, and bool. It also supports nested containers up to three levels deep, with the option to support more layers by modifying the source code.

Performance Comparison

While CSTL may be slower than Python’s native list and numpy for basic tasks, it outperforms them in specific tasks. Here is a comparison table showcasing the performance of CSTL against Python, numpy, and pytorch:

python numpy cstl pytorch
add1 0.19 0.28 0.911 4.714
read 0.161 0.2 0.526 1.033
sliceread 0.327 0.264 0.683 1.381
append 0.204 >10 0.351 >10
pop >10 >10 0.595 >10

Most importantly, CSTL solves the CoW issue and provides more data structures than numpy, making it a valuable tool for Python developers.

Conclusion

CSTL is a powerful tool that allows Python developers to leverage the efficiency and flexibility of C++ STL. It provides a solution to the CoW issue and offers more data structures than numpy. If you’re a Python developer dealing with memory-intensive programs or if you’re simply looking for a more efficient alternative to Python’s native containers, CSTL is definitely worth a look.

Check out the CSTL repository on GitHub for more information and to start using it in your projects.