Launch of the scikit-learn initiative, a reference software library for machine learning
Inria is pleased to announce the launch of the scikit-learn initiative, whose objective is to speed up, with the support of user companies, the development of this reference software by adding new functionalities. Scikit-learn is a library developed in Python, an object-oriented programming language. It is dedicated to statistical learning (machine learning) and can be used as middleware, especially for prediction tasks.
Ten years of research
Initially launched in 2007 by members of the Python scientific community, the scikit-learn project really developed within the framework of research work on functional brain imaging, conducted within Inria's Parietal team. The team needed a predictive modelling tool that integrated with the Python ecosystem. It then organised an open participatory development workshop with the objective of implementing open source statistical data analysis methods. Two years later, a stable version was released.
Clear objectives were set at the start of the project. So that the library could be easily installed on different platforms, the development team made sure that it was well packaged and wrote extensive documentation with concrete examples on the use of the tool. It also insisted that all methods be covered by a series of automatic tests that help ensure the quality of the code base over the long term.
Retrieve complex data for classification
is used to retrieve complex data structures (texts and images) and classify them using state-of-the-art techniques.
Scikit-learn is open source and available under BSD license. A community of developers (inside and outside Inria) quickly formed, which made it possible to accelerate the development of the tool and promote applications, particularly in the processing of time series. A regularly updated website (scikit-learn.org) provides a detailed introduction to the product and its applications.
Scikit-learn is used by a large number of Web companies to predict user buying behaviour, offer product recommendations and detect trends and abusive behaviour (fraud, spam, etc.).
Diversified fields of application
One of scikit-learn's strong points is its generic nature, which ensures great versatility and diverse applications, such as:
- fight against fraud and spam
- e-mailing and marketing campaigns
- prediction of user behaviour
- optimisation of industrial and logistic processes.
For example, an application for the general public such as reserving tourist accommodations must use machine-learning tools such as scikit-learn to automate tasks. A data scientist is needed to understand the applications and the data they generate so that the data processing systems can be programmed efficiently.
Scikit-learn is a constantly evolving, easy-to-use, effective and accessible statistical learning library for non-experts in data science. In the data mining stage, the user enters a few lines in an interactive interface and can immediately view the results of his/her query.
Scikit-learn is supported by a large team of developers based in Paris, as well as in New York, Australia and around the world. It is in the top three most popular machine learning software programs on GitHub.
The scikit-learn consortium
To accompany and stimulate the scikit-learn
ecosystem, a consortium of patrons has been created with the support of the Inria Foundation. It can thus help development engineers to ensure the quality of the project and the integration of new contributions, as well as the addition of ambitious new features, all in connection with and for the benefit of its vast community of users and developers.
Consortium members (BCG Gamma, Microsoft, Axa, BNP Paribas Cardif, Intel, Nvidia and Dataiku ) and initiative partners are involved as supporters and patrons in defining development priorities and the project’s public profile.